Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

نویسندگان

Hairong Liu

Zhenyao Zhu

Xiangang Li

Sanjeev Satheesh

چکیده

Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this paper, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. We demonstrate that the proposed Gram-CTC improves CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that with Gram-CTC we can outperform the state-of-the-art on a standard speech benchmark.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multisyn: Open-domain unit selection for the Festival speech synthesis system

We present the implementation and evaluation of an open-domain unit selection speech synthesis engine designed to be flexible enough to encourage further unit selection research and allow rapid voice development by users with minimal speech synthesis knowledge and experience. We address the issues of automatically processing speech data into a usable voice using automatic segmentation technique...

متن کامل

Unit selection synthesis database development using utterance verification

Accurate annotation of the unit inventory database is of vital importance to the quality of unit selection text-to-speech synthesis. The time consuming manual work involved in database development limits the ability to produce new voices quickly and at low cost. Automatic annotation is therefore more and more in use. Misalignments due to mismatch between the predicted and pronounced unit sequen...

متن کامل

Comparison of Decoding Strategies for CTC Acoustic Models

Connectionist Temporal Classification has recently attracted a lot of interest as it offers an elegant approach to building acoustic models (AMs) for speech recognition. The CTC loss function maps an input sequence of observable feature vectors to an output sequence of symbols. Output symbols are conditionally independent of each other under CTC loss, so a language model (LM) can be incorporate...

متن کامل

Automatic Prosody Labelling of read Norwegian

In this paper we present initial work on a method for automatic stress and boundary labelling of read EastNorwegian. The context of this work is automatic corpus annotation for unit selection speech synthesis. A phonological model of Norwegian prosody is described. The identification of syllable stress and major intonational boundaries are key prosodic events for building a prosodic description...

متن کامل

Join Cost for Unit Selection Speech Synthesis

In unit-selection speech synthesis systems, synthetic speech is produced by concatenating speech units selected from a large database, or inventory, which contains many instances of each speech unit with varied prosodic and spectral characteristics. Hence, by selecting an appropriate sequence of units, it is possible to synthesize highly natural-sounding speech. The selection of the best unit s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

نویسندگان

چکیده

منابع مشابه

Multisyn: Open-domain unit selection for the Festival speech synthesis system

Unit selection synthesis database development using utterance verification

Comparison of Decoding Strategies for CTC Acoustic Models

Automatic Prosody Labelling of read Norwegian

Join Cost for Unit Selection Speech Synthesis

عنوان ژورنال:

اشتراک گذاری